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O . Abstract 

A novel detector for multiple-input multiple-output (MIMO) communications 
is presented. The algorithm belongs to the class of the lattice detectors, i.e. it 
finds a reduced complexity solution to the problem of finding the closest vector 
to the received observations. The algorithm achieves optimal maximum-likelihood 
(ML) performance in case of two transmit antennas, at the same time keeping a 
\ complexity much lower than the exhaustive search-based ML detection technique. 

Also, differently from the state-of-art lattice detector (namely sphere decoder), the 
proposed algorithm is suitable for a highly parallel hardware architecture and for a 
reliable bit soft-output information generation, thus making it a promising option 
■^j- ' for real-time high-data rate transmission. 
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1 Introduction 

in 

Wireless transmission through multiple antennas, also referred to as MIMO (Multiple- 
Input Multiple-Output), currently enjoys great popularity because of the demand of high 
data rate communication from multimedia services. 

In MIMO fading channels ML detection is desirable to achieve high-performance, as 
this is the optimal detection technique in presence of additive Gaussian noise. ML detec- 
tion involves an exhaustive search over all the possible sequences of digitally modulated 
symbols, which grows exponentially as the number of transmit antennas. Because of 
their reduced complexity, sub-optimal linear detectors like Zero-Forcing (ZF) or Mini- 
mum Mean Square Error (MMSE) pQ are widely employed in wireless communications. 
Such schemes yield a low spatial diversity order: for a MIMO system with L t transmit 
and L r receive antennas this is equal to L r — Lt + 1, as opposed to L r for ML j2J. ZF 
and MMSE have also been proposed in combination with Interference Cancellation (IC) 
techniques [3] . However the performance of such nonlinear detectors is better than linear 
detectors but does not always give near-ML performance. 

Lattice decoding algorithms, like sphere decoder (SD) [I], have been proposed for 
systems whose input-output relation can be represented as a real-domain linear model 

y r = Bx r + n r (1) 
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where if n = 2L r and m = 2L t , the channel output vector y e R n } the input vector x £ 
i? m is carved from a discrete finite set of values and B is a n x m real matrix representing 
the channel mapping of the transmit codebook into a received lattice corrupted by the 
Gaussian noise n e B is also referred to as the lattice generator matrix. SD can 
attain ML performances with significant reduced complexity. The lattice formulation 
for MIMO wireless systems was described in jS] in case of QAM digitally modulated 
transmitted symbols; in that case a system equation in the form (JTJ can be derived. 

Besides SD, to our knowledge the class of ML-approaching algorithms is quite limited. 
Other examples include the reduced search set presented in jHj, which does not yield 
good performance below 10~ A BER, or the approximate method 9], which entails high 
complexity; also, no performance results are reported for a constellation size larger than 
QPSK. 

The SD algorithm converges at the ML solution while searching a much lower number 
of lattice points than the exhaustive search required by a " brute- force" ML detector. 
However, it presents a number of disadvantages; most important are: 

1. It is an inherently serial detector and thus is not suitable for a parallel implemen- 
tation. 

2. Parameter sensitivity. The number of lattice points to be searched is variable and 
sensitive to many parameters like the choice of the initial radius; the signal to noise 
ratio (SNR); the (fading) channel conditions. This means it could be unsuitable 
for applications requiring a real-time response in data communications. 

3. Bit soft output generation. In JT] the idea of building a "candidate list" of se- 
quences to compute the bit log-likelihood ratios (LLR) was discussed. Unfortu- 
nately the optimal size of such a list is a function of the system parameters and 
can still be very high (thousands of lattice points) for practical applications. 

In this paper, we propose a novel layered orthogonal lattice detector (LORD) for 
two transmit antenna MIMO systems, which achieves ML performance in case of hard- 
output demodulation and optimally computes bit LLRs when soft output information 
is generated. Similarly to SD, LORD consists of three different stages, namely a lat- 
tice formulation, different from the one introduced in [S] and typically used by SD; the 
preprocessing of the channel matrix, which is basically an efficient way to perform a 
QR decomposition; and finally the lattice search, which finds an optimal solution to the 
closest vector problem [0], given the observations. The innovative concept, compared to 
SD, is that the search of the lattice points can be made in a parallel fashion, and fully 
deterministic. The number of lattice points to be searched is well below the exhaustive 
search ML algorithm, and for soft output generation is linear in the number of transmit 
antennas. 

The paper is organized as follows. In Section El we introduce the system notation 
used throughout the paper and describe the lattice representation for LORD. In Section 
121 the preprocessing algorithm of the lattice matrix is described. Section HI details the 
reduced complexity ML demodulation technique. Its principles lead to the formulation 
of the optimal max-log bit LLR derivation, explained in Section Section El shows the 
performance results obtained applying LORD to a BICM system and flat Rayleigh fading 
channel. Finally, Section [7| concludes the paper. 



2 System Notation and Lattice Formulation 



The scenario considered in this document is a linear MIMO communication system with 
L t = 2 transmit and L r receive antennas and frequency nonselective fading channel. 
The information symbol vector x = (Xi X 2 ) T ', where Xj, j = 1, 2 is a complex symbol 
belonging to a given quadrature-amplitude modulation (QAM) or phase-shift keying 
(PSK) constellation, is distributed among the two transmit antennas and synchronously 
transmitted. The signal received at each antenna is therefore a superposition of the two 
transmitted signals corrupted by multiplicative fading and additive white Gaussian noise 
(AWGN). The output of the matched filters to the pulse shape at each receive antenna 
can be written in matrix notation as: 




(2) 

where E s is the energy per transmitted symbol (under the hypothesis that the average 
constellation energy is _E[|Xj| 2 ] = 1); the entries of the L r x 2 channel matrix H, Hji, 
represent the complex path gains from transmit antenna i to receive antenna j; y = 
(Yi . . . Yi r ) T and n = (iVi . . . Ni r ) T are the L r x 1 complex received signal and AWGN 
sample vectors respectively. The complex path gains are samples of zero mean Gaussian 
random variables (RVs) with variance a 2 H = 0.5 per real dimension. Fading processes for 
different transmit and receive antenna pairs are assumed to be independent. We assume 
independent noise at each receive antennas, samples of independent circularly symmetric 
zero-mean complex Gaussian RVs with variance Nq/2 per dimension. 

In the remainder of this paper we will always assume L t = 2. It will prove useful later 
to use the notation 




(3) 

where h C j is the complex gain vector from the i transmit antenna to the receive antennas. 

The present paper deals with a simplified yet optimal method to estimate the transmit 
sequence x, i.e. it solves the ML detection problem: 




argmin ||y — y — Hx|| (4) 

The algorithms deals only with real quantities, i.e. the in-phase (I) and quadrature- 
phase (Q) components of the complex quantities in (J2J). To this end a suitable lattice 
representation of the MIMO system is defined: 



= [Xij, X ltQ , X z ,i, X 2 ,q] = [x 1 ,...,x a ] (5) 

y r — ^L,Q> • • • > Yl t ,i, ^i r ,Q] T (6) 

n r = [Nxj, N 1)Q , ... , N LrtI , N LrjQ f (7) 



Then (jSJ) can be re-written as: 





9 H r x r + n r = y— hi,...,h 4 x, r + n r (8) 

H r is the real channel matrix, which acts as the lattice generator matrix - cfr. (jl}. Each 
pair of columns (h 2 fc_i, h 2 £;), k = {1,2}, has the form: 

h 2k ^ = [U[H lk ], S[H lk ), U[H 2k ], %[H 2k ], $t[H Lrk ], %[H Lrk ]] T (9) 

h 2k = [-3[iZi fc ], $l[H lk \, -%[H 2k ], ft[H 2k ], . . . , -%[H Lrk \, ^[H Lrk f (10) 



It should be noted that this ordering is slightly different than the ordering used in the 
lattice search literature for multiple antenna communications. This change in ordering 
greatly impacts the complexity and architecture of the ML demodulator. 



3 The Preprocessing Algorithm 



This section describes an efficient way to preprocess H r , defined in (fg|)-()lL)jl. It should be 
understood that a standard QR decomposition could be applied without impairing the 
detection algorithm; the algorithm described below however is more efficient as particular 
care is taken to avoid performing unnecessary operations (e.g., vector normalization, 
implying a real division and square root). Specifically for L r > 2 there is an 2L t x 2L r 
orthogonal matrix 



where 



q 3 

Q4 



Q = hi h 2 q 3 q 4 



|hi|| 2 h 3 - (hfh 3 )h! - (h^h 3 )h 2 



such that 



Q T Q = diag ||h 1 || 2 ,||h 1 || 2 ,||q3|| 2 ,||q 3 | 



It should be noted that 
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where 



r 3 = ||h 3 || 2 ||hi|| 2 - (h^h 3 ) -(hfha) . 
There is a 2L t x 2L t upper triangular matrix 
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There is a 2L t x 2L t diagonal matrix 

A q = diag 

These three matrices are related to the original real channel matrix as 

H r = QRAg. 



1,1, 



ihiir 2 ,iihiir 2 



(ii) 

(12) 
(13) 

(14) 

(15) 
(16) 



(17) 
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(19) 



It should be noted that if L r — 1 then the bottom two rows of R will be eliminated but 
the same general form will hold for the top two rows. 

Because of this structure the detection problem on the MIMO channel can be trans- 
formed into a structure suitable for lattice search algorithms. To this end note that the 
4x1 vector 

'-£/<{ - 

Rx r + Q n r 



Q J y r 




'E, ~ 
} Rx, + n 



(20) 



where 



R = Q T QRA„ 



Ihif hfh 3 hfh 4 

Hhill 2 hjLg .x 2 
r 3 
r 3 



bjh 4 



(21) 



The noise vector in the triangular model still has independent components but the com- 
ponents have unequal variances, i.e., 
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I tti 1 1 r 3 , ||hi|| r 3 



(22) 



The interesting characteristic of the model formulation in this manner is that each of the 
I and Q components of each transmitted signal are broken into orthogonal dimensions 
and can be searched in an independent fashion. 

All parameters needed in this triangularized model are a function of eight variables. 
Four of the variables are functions of the channel only, i.e., 



C7i 



|hi| 



hfh 3 



S2 



hfh 4 . 



and four are functions of the channel and the observations, i.e., 

V^Icl^ V 2 = h r 2 y r ^ 3 = h 3 r y r V, = hfy r . 



(23) 



(24) 



It should be noted that ||hi|| = ||h 2 || , ||h 3 || = ||h 4 || , h x h 3 = h 2 h 4 , and that h x h 4 = 
— h^b^. These two equalities imply that the 2x2 matrix in the upper right corner of R 
is a rotation matrix. Specifically the required results for the upper triangular formulation 
is 
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(25) 

This formulation greatly simplifies the lattice search formulation and results in a prepro- 
cessing complexity that is 0(16L r ). 

As it will prove useful when dealing with soft output generation, shifting the ordering 
of the transmit antennas will result in a similar model. When the order of transmit 
antennas is reversed the model becomes 



" V~s\ ' 
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(26) 



4 ML Demodulation 



In this section we describe how the system equations defined in Section 0] lead to a 
simplified yet optimal ML demodulation. Consider a PSK or QAM constellation of size S. 



For the sake of conciseness, the discussion here will assume that (M 2 )-QAM modulation 
is used on each antenna, but the derivation is valid - with straightforward generalizations 
- for any complex constellation. The optimum ML word demodulator (0J) would have to 
compute the ML metric for M 2Lt constellation points and has a complexity 0(M 4 ) for 
L t = 2. 

The notation used in the sequel is that Q x will refer to the M-PAM constellation for 
each real dimension. Given the formulation in (J2(Jj) - (J25j) and neglecting scalar energy 
normalization factors for simplicity, the ML decision metric becomes 

(Vl - v\x\ - Six 3 - s 2 x 4 f (y 2 - o\x 2 + s 2 x 3 - SiX 4 f 



T(x f 



07 07 



(§3 ~ r 3 x 3 ) 2 (y 4 - r 3 x A f .^fs 



v 2 r 3 a\r 3 



The ML demodulator finds the maximum value of the metric over all possible values of 
the sequence x r . This search can be greatly simplified by noting for given values of x 3 
and x 4 the maximum likelihood metric reduces to 

rp, n_ (Vi - o\x x - Ci(x 3 ,x 4 )) 2 (y 2 - o\x 2 - C 2 (x 3 , x A )f 

1 [X r ) — 2 2 ^3^3) x 4) {^O) 

where 

Ci(x 3 , x 4 ) = s 1 x 3 + s 2 x A C 2 (x 3 , x A ) = -s 2 x 3 + S1X4 C 3 (x 3 , x 4 ) > (29) 

"•Iqov Trnm o-vo m minrf flv&fi f nof hud "f" r\ f ho r\v^i- n nrfATi o n -f ^ r f\T -f- n a nrnnlom tv \v 



It is clear from examining (|2H|) that due to the orthogonality of the problem formulation 
the conditional ML decision on x\ and x 2 can immediately be made by a simple threshold 
test, i.e., 

^-^(fez^), (30, 

The round operation is a simple slicing operation to the constellation elements of Q x . 
The final ML estimate is then given as 

r */* * \ * / * ~ \ . -, max f (y x - afx^x^ x 4 ) - C t (x 3 , x±)f 
{xi{x 3 ,x i ),x 2 {x 3 ,x 4 ),x 3l x A \ = arg x 3 ,z 4 efi| < 



2 



_ {y 2 -a\x 2 {x 3 ^)-C 2 {x 3 ^)) _ c ^^ (31) 
°i J 
This implies that the number of points that has to be searched in this formulation to 
find the true ML estimator is M 2 (with two slicing operations per searched point) and 
not M 4 . This is a significant saving in complexity. 

Examining (|30p and (|31|) shows this reduced complexity ML demodulation is a direct 
consequence of the reordered lattice formulation. Recall each group of two rows in the 
model correspond to a transmit antenna. Equation ()30|) shows that at the top of the 
triangularized model the decisions for the first transmit antenna can be made indepen- 
dently for the I and the Q modulation. This was not true for the traditional lattice 
formulation [5] as after the triangularization the higher rows become dependent on all 
the lower layers of the transmit modulation. Retaining this orthogonalization greatly 
simplifies the optimal search and has important implications for suboptimal searches. 
Secondly this orthogonalization also greatly facilitates parallel searches, solving one of 
SD drawbacks, i.e. the fact that the search must be performed in a recursive fashion. 



5 LLR generation 



The problem is first here recalled for complex- domain system (J2J). Consider the in- 
formation symbol vector x = (X x X 2 ) T , where Xj, j = 1, 2 is a complex symbol be- 
longing to a given M 2 -QAM constellation and be M c the number of bits per symbol. 
The bit soft-output information is the a-posteriori probability (APP) ratio of the bit 
bk, k = 1, . . . , 2M C , conditioned on the received channel symbol vector y; that is often 
expressed in the logarithmic domain (log-likelihood ratio, LLR) as: 

E ny|x)p a (x) 

r /7 I \ , p (h = l y) , xgg(A.-)+ 

L ^ = l "J^W ) = la E P (y |x)P a (x) (32) 

xeS(fc) - 

where S(k) + is the set of 2 2Mc ~ 1 bit sequences having 6^ = 1, and similarly S(k)~ is the 
set of bit sequences having b k = 0; -P a ( x ) represent the a-priori probabilities of x, which 
can be neglected in case of equiprobable transmit symbols, as it is the case of this paper. 
In the general case, the likelihood function P(y|x) can be derived from (J2J): 



P(y|x) cx exp 



2a 2 




exp [-D(x)] (33) 



where a 2 = Nq/2 and -D(x) is the Euclidean distance term. The summation of exponen- 
tials involved in (|32|) is often approximated according to the following so-called max-log 
approximation: 

In 22 ex P [ — -^*( x )] ~ ^ n max ex P [ — -D( x )] = — mrn -D( x ) (34) 



Alternatively, it is possible to exactly compute (|32j) through the "Jacobian logarithm" 
or max* function 

jacln(a, b) := In [exp (a) + exp (6)] = max (a, b) + In [1 + exp (— \a — b\)]. (35) 

Simulations in show that the performance degradation due to max-log approximation 
is generally very small compared to the use of max* function. Using (|34p. (j32j) can then 
be written as: 

L(6 fe |y)^ min D(x) - min D(x) (36) 
xeS(k)- xeS(k)+ 

In the sequel we will show how LORD can provide an exact computation of ()36|) but with 
a much lower complexity than exhaustive search-ML. The computation of (|36|) requires 
identification of the most likely lattice point with b k = 1 and the most likely lattice 
point with bk = for each bit index k = 1, . . . , 2M C . The problem in the case of the 
SD algorithm is clearly stated in [TT] . By definition, one of the two sequences is the 
(optimum) hard-decision ML solution of (J3J). However, using SD, there is no guarantee 
that the other sequence is one of the valid lattice points found by SD during the process 
of the lattice search. 

LORD does not have this problem generating LLRs. To show this let us consider 
the bits corresponding to the complex symbol X2 in the symbol sequence x = (X\ X2) 7 '. 
After the lattice representation is derived, from (j2*U|) and (|2T|) the likelihood function 
P(y|x f .) is given by: 

P(y|x r ) = exp[-|T(x r )|]. (37) 



where T(x r ) is defined in (|27|). Using arguments similar to those that led to the simplified 
ML demodulation (|25 |) -(|5U |l . one can easily prove that the two sequences needed for every 
bit in X 2 are certainly found minimizing (|27|) over the possible M 2 values of (x 3 , X4) and 
performing a simple slicing operation to the constellation elements of Q x ; thus the desired 
couples (xi, x 2 ) are uniquely determined for every (x 3 , 24). Equation (J52*Jl can be then 
written as: 

£(My)~ min T(xr)- min T(x r ) (38) 

where 62, fe are the bits corresponding to X 2 , k = 1, . . . ,M C , and S(k) 2 (S(k) 2 ) are the 
set of 2 Mc_1 bit sequences having 6 2 ,fc = 1 (&2,fc — 0). 

The computation of the LLRs for the bits corresponding to symbols in Xi,x 2 can be 
obtain by a simple reordering of the model and a repeating of the LORD processing. To 
this end denote x s = [x 3 x 4 x\ x 2 ] as the reordered real modulation symbols, then the 
LLR can be given as 

L(My~J~ min T\x s )- min T'(x s ) (39) 

xi,X2£S(k) 1 xi,X2£S(k){ 

where are the bits corresponding to X±, k = 1, . . . ,M C , S(k)i (S(k)~[) are the set 
of 2 Mc_1 bit sequences having &x,fe = 1 {bi,k — 0). The reordered ML decision metric 
becomes 



0"! 0"2 



(2/ S 3 - ^3^i) 2 (y S 4 - r 3 ^ 2 ) 2 , 4Q ^ 



cr 2 r 3 a 2 r 3 



in a direct analogy to fl2SJ). By comparing and (|23|). it should be clear that comput- 
ing the processing required by the reordered sequence involves a low amount of extra- 
complexity. Besides, the LLR computation for the bits corresponding to symbols X\ 
and X 2 can be carried out in a parallel fashion. Thus, we have derived an exact Max- 
log bit LLR computation using two layer orderings and an overall lattice search over 
2M 2 sequences instead of M 4 as would be required by the exhaustive search-based ML 
algorithm. 



6 Performance results 

In this section two examples are presented with the corresponding performance evaluated 
by simulation. The examples will be limited to frequency flat and time flat fading for 
simplicity but this does not represent a limit on how LORD can be applied in MIMO 
detection. The first example is an uncoded L t = 2 system using 64QAM modulation. 
The maximum likelihood detection word error probability performance in spatially white 
Rayleigh fading is shown in Fig. ^ The ML performance is compared to the well under- 
stood zero forcing (ZF) detector. The advantage of ML detection versus linear detection 
in terms of diversity is obvious from these plots. 

A second system consists of a bit interleaved coded modulation (BICM) with a frame 
size of 144 coded bits using 64QAM modulation on each antenna. The interleaver used 
was a 12 x 12 block interleaver so that each adjacent bit could be permuted to a different 
antenna and different significant bit on the QAM modulation while being on different time 




a) Word error probability. b) Bit error probability. 

Figure 1: The decoding performance for uncoded MIMO transmission. L t = 2 and 12 
bits per channel use. 




a) Word error probability. b) Bit error probability. 

Figure 2: The BICM decoding performance. L t = 2, Lr = 2 and 6 bits per channel use. 

slots. The convolutional code is the standard 64-state rate 1/2 binary convolutional code 
with octal generators (133, 171). The performance is shown in Fig. |2]for L t = 2, L r . = 2, 
and spatially white Rayleigh fading for both hard and max-log inner bit metric generation. 
Clearly this figure demonstrates LORD is a system that can exactly compute the max-log 
LLR at a low complexity when used in a concatenated MIMO coded modulation. 

7 Conclusion 

We have presented a novel lattice search-based MIMO detection algorithm for two trans- 
mit antennas, characterized by a low preprocessing complexity, that achieves optimal ML 
demodulation with a complexity of the order of S, if S is the constellation size. Also, the 
algorithm is able to generate optimal Max-log bit LLRs using a parallel lattice search 
over 2S transmit sequences. 

To our knowledge no ML-approaching detection technique, among those proposed so 



far, is able to generate a low complexity optimal bit soft output information and easily 
be suitable for a parallel hardware implementation. LORD provides all these desirable 
features and thus represents a significant improvement over the state of the art in this 
field. 
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